Entity Identification in Documents Expressing Shared Relationships

نویسندگان

  • JOHN R. TALBURT
  • NINGNING WU
  • ELIZABETH PIERCE
  • CHIA-CHU CHIANG
چکیده

This paper addresses the problem of entity identification in documents in which key identity attributes are missing. The most common approach is to take a single entity reference and determine the “best match” of its attributes to a set of candidate identities selected from an appropriate entity catalog. This paper describes a new technique of multiple-reference, shared-relationship identity resolution that can be employed when a document references several entities that share a specific relationship, a situation that often occurs in published documents. It also describes the results obtained from a recent test of the multiple-reference, shared-relationship identity resolution technique applied to obituary notices. The preliminary results show that the multiple-reference technique can provide higher quality identification results than single-reference matching in cases where a shared relationship is asserted. Key-Words: Entity Identification, Entity Resolution, Identity Management, Feature Extraction, Text Mining, Obituary Notices

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches

Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...

متن کامل

PAYMA: A Tagged Corpus of Persian Named Entities

The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...

متن کامل

ABRIR at NTCIR-9 GeoTime Task Usage of Wikipedia and GeoNames for Handling Named Entity Information

 ABRIR at NTCIR-8 Approach C t ti f N d E tit d t b Construction of Boolean query by using pseudo relevant documents Named entity is more important than others Verb synonym list is used for increasing coverage Combination of Boolean IR model and probabilistic IR model (Okapi) Penalty is applied for documents that don’t satisfy Boolean query Named entity is more important than others  F...

متن کامل

Graph-Community Detection for Cross-Document Topic Segment Relationship Identification

In this paper we propose a graph-community detection approach to identify cross-document relationships at the topic segment level. Given a set of related documents, we automatically find these relationships by clustering segments with similar content (topics). In this context, we study how different weighting mechanisms influence the discovery of word communities that relate to the different to...

متن کامل

Automatic Identification of Concepts and Conceptual relations from Pa- tents Using Machine Learning Methods

This paper presents a machine learning approach to automatically extract concepts and the conceptual relations towards creation of Conceptual Graphs (CGs) from patent documents using shallow parser and NER. The main challenge in the creation of conceptual graphs from the natural language texts is the automatic identification of concepts and conceptual relations. The texts analyzed in this work ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007